Development of a stemming algorithm

نویسنده

  • Julie Beth Lovins
چکیده

A stemming algorithm, a procedure to reduce all words with the same stem to a common form, is useful in many areas of computational linguistics and information-retrieval work. While the form of the algorithm varies with its application, certain linguistic problems are common to any stemming procedure. As a basis for evaluation of previous attempts to deal with these problems, this paper first discusses the theoretical and practical attributes of stemming algorithms. Then a new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application. A major linguistic problem in stemming, variation in spelling of stems, is discussed in some detail and several feasible programmed solutions are outlined, along with sample results of one of these methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی تأثیرات ریشه‌یابی در بازیابی اطلاعات در زبان فارسی

Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...

متن کامل

Porter’s stemming algorithm for Dutch

A stemming algorithm provides a simple means to enhance Recall in Text Retrieval systems. The paper describes the development of a Dutch version of the Porter stemming algorithm. The stemmer was evaluated using a method inspired by Paice (Paice, 1994). The evaluation method is based on a list of groups of morphologically related words. Ideally, each group must be stemmed to the same root. The r...

متن کامل

A Stemming Algorithmm for the Portuguese Language

Stemming algorithms are traditionally used in Information Retrieval with the goal of enhancing recall, as they conflate the variant forms of a word into a common representation. This paper describes the development of a simple and eflective su&?x-stripping algorithm for Portuguese. The stemmer is evaluated using a method proposed by Paice f9/. The results show that it performs significantly bet...

متن کامل

HPS: High precision stemmer

Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, whi...

متن کامل

A new Arabic stemming algorithm

Text processing is a vital step in the information retrieval process, text mining, and natural language processing. It includes several stages, such as normalization, stop word removal, and stemming. Stemming is the process of reducing the lexicon to its root. Due to the different structures and rules in languages, the task of stemming is language-dependent. This research introduces a new stemm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Mech. Translat. & Comp. Linguistics

دوره 11  شماره 

صفحات  -

تاریخ انتشار 1968